In this this investigation, I looked at the characteristics of prosper loan variables to see what determins if a loan will perform well or not.
I explored certain variables and how they interact to try to see if they can affect the loan status and to what extent. Overall my goal was to try to see how the the different variables in the dataset relate with Prosper Score and Credit Grade, because these two variables determins are meant to 'Predict' loan performance.
The data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, Prosper Score(A custom risk score which originated after July 2009, it was built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score.), Credit Grade(The Credit rating used pre-2009 period, when the listing went live.) and many others.
The data set was last updated on 03/11/2014
From the map visualisation, we can see that most Prosper borrowers come from California.
I wanted to find out why and I did a little search and it turns out, the company was founded in California, USA in 2005, it's headquarters is also San Francisco.
I thought the situation of their offices was a possible reason but then I checked their other office location: Phoenix, Arizona and there, they had just 1901 Prosper Borrowers so i abandoned that theory.
Lastly I checked U.S.A's population distribution by state and the Prosper Borrower number mirrors that.
variable = 'ListingKey'
gdf_test_df=gdf_test_df.sort_values(by=variable, ascending=True)
colormap = folium.LinearColormap(colors=["red","orange","yellow","green"],vmin=gdf_test_df.loc[gdf_test_df[variable]>0, variable].min(),
vmax=gdf_test_df.loc[gdf_test_df[variable]>0, variable].max()).to_step(n=5)
centroid=gdf_test_df.geometry.centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], zoom_start=4, tiles='StamenWatercolor')
#tooltip=folium.features.Tooltip()
#folium.
folium.GeoJson(gdf_test_df[['geometry','name',variable]],
name="United States of America",
style_function=lambda x: {"weight":2, 'color':'black','fillColor':colormap(x['properties'][variable]), 'fillOpacity':0.2},
highlight_function=lambda x: {'weight':3, 'color':'black'},
smooth_factor=2.0,
#tooltip=tooltip
tooltip=folium.features.GeoJsonTooltip(fields=['name',variable],
aliases=['State','Number of loans by state'],
labels=True,
sticky=True,
)
).add_to(m)
colormap.add_to(m)
folium.LayerControl(autoZIndex=False, collapsed=False).add_to(m)
m
From the word cloud and the occupation count plot, When we remove two highest Occupation classifications, we see that Computer Programmer is the Highest and it gradually reduces till the last which is Student-Technical School. From out dataset, the two most common Occupations are 'other' and 'professional';which is a broad classification and doesnt tell us much.
img_link2='https://raw.githubusercontent.com/tonyodiba/Prosper_Loans_data_exploration/master/files/326-3267244_prosper-loans-on-twitter-prosper-marke.png'
icon=Image.open(requests.get(img_link2,stream=True).raw).convert("RGBA")
mask = Image.new("RGBA", icon.size, (255,255,255))
mask.paste(icon,icon)
mask = np.array(mask)
wc=loan.Occupation.value_counts().to_dict()
wordcloud=WordCloud(width = 400, height = 400,
background_color='white',max_font_size=300,
random_state=123,stopwords=STOPWORDS,mask=mask).generate_from_frequencies(wc)
#wordcloud.recolor(color_func=color_func, random_state=7)
wordcloud.recolor(color_func=prosper_hue,random_state=53)
plt.figure(figsize=(10,10),facecolor = 'white', edgecolor='blue')
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout(pad=2)
plt.axis("off")
plt.show()
The Income Bracket borrowing the most from Prosper are the \$25,000-\$49,000, followed closely \$50,000-\$74,000 Income Bracket. When grouped by Employment Status, people listed as Employed borrowed more and "Retired" borrowed least.
loan.IncomeRange=loan.IncomeRange.astype('category')
plt.figure(figsize = [15, 4.5])
plt.subplots_adjust(wspace = 0.3)
plt.subplot(121)
set_my_style()
sb.set_context('notebook')
ax=loan.IncomeRange.value_counts().plot.bar(alpha=0.9,color=prosper_hue2[1])
plt.title('Income Rate Distribution')
plt.xlabel('Income Bracket')
plt.ylabel('Count')
plt.subplot(122)
ax=loan.EmploymentStatus.value_counts().plot.bar(alpha=0.9,color=prosper_hue2[0])
plt.title('Employment Status')
#plt.xlabel('Loan Status')
plt.ylabel('Count')
plt.xlabel('Employment Status')
plt.show()
From the plot of both ProsperScore and CcreditGrade vs LoanStatus(i.e Performing and NonPerforming Loans), I observed that Prosper Score was wayy more efficient at determining the performance of a loan than creditGrade was. This indicates that the risk management system at Prosper has actually gotten better after the restructuring.
plt.figure(figsize=(15,4.4))
plt.subplots_adjust(wspace = 0.5)
plt.subplot(121)
ax=sb.countplot(x='ProsperScore',data=loan,hue='Loan_stat1',hue_order=['Performing','NonPerforming'],alpha=0.9,
palette=['#FE7702', '#E10C79'])
plt.subplot(122)
ax=sb.countplot(x='CreditGrade',data=loan,hue='Loan_stat1',hue_order=['Performing','NonPerforming'],alpha=0.9,
palette=['#FE7702', '#E10C79'])
For BorrowerAPR vs DebttoIncomeRatio, firstly, most people tend to have debt-to-income ratios below 1, regardless of risk category. Iobserved alos that AA category loans seem to have a lower BorrowerAPRs and a smaller range of debt-to-income ratios, both of which indicate less risk and APR increases as the rating gets riskier. Most people tend to have debt-to-income ratios below 1, regardless of risk category.
sb.set_context('talk')
ax=sb.lmplot(data=loan,size=7,aspect=2,x='DebtToIncomeRatio',y='BorrowerAPR',hue='CreditGrade',
fit_reg=False,palette=prosper_hue_medium_plasma_r)
ax.set_titles('DebtToIncomeRatio vs BorrowerAPR grouped by CreditGrade')
plt.show()
This plot of the StatedMonthlyIncomw vs LoanOriginalAmount categorized by Risk profile and grouped by colored representing the different Prosper Scores does a good job of showing us how the StatedMonthlyIncomw and LoanOriginalAmount are related.
From the plot we can see that the ordinary leeast square regression line shows an upward trend, meaning that as statedMonthly income increased, the loan amoint increased. this trend is more apparent in the prosperScore 10 group.
It is an interactive plot please highlight the ProsperScore 10 to visualize it.
px.scatter(loan_no_nan,width=1000,height=700,title='StatedMonthlyIncomw vs LoanOriginalAmount categorized by Riskprofile and ProsperScore',trendline='ols',trendline_color_override='Black',x='LoanOriginalAmount',
y='StatedMonthlyIncome',color='ProsperScore',log_x=True,facet_col='Loan_risk',
log_y=True)
This plot shows Borrower Rate vs Prosper Score grouped by Performing vs Non-Performing loans. I classified the LoanStatus into Performing and NonPerforming Categories.
The performing Category consists of;
- The 'Completed' class
- The 'Current' class (since the haven't defaulted)
- The 'FinalPaymentInProgress' class
BorrowerAPR decreases with improving Prosper Score. For performing loans, it was observed that prosperScore of >10 enjoy the least BorrowerAPR, with a relatively short range of about 0.05-0.14, and it seems Home owners enjoy an even smaller BorrowerAPR compared to non-home owners.
For the non-performing loan group, from a prosperScore of 0-5, the borrower Annual Percentage Rate is about the same with variation in range, however from the rating score of >6, it decreases with improving score. Also here, the BorrowerAPR for home owners appears to be lower than for non-home owners.
px.box(loan_no_nan,width=1000,height=450,x='ProsperScore',color='IsBorrowerHomeowner',
y='BorrowerAPR',log_x=False,facet_col='Loan_stat1',
log_y=False)
This is a fun plot that shows the interaction of 9 different variables;
- 'Original Loan Amount'
- 'Loan Status'
- 'DebtToIncomeRatio'
- 'Monthly Income(Stated)'
- 'Recommendations'
- 'Investors'
- 'TotalProsperLoans'
- 'Prosper Score'
From the 'Parallel plot showing interaction of 9 different variables' I observed that they were a lot of loans which had a prosperScore of 9,10 and 11 that were either chargedoff or defaulted,some of these loans with original loan amount between \$20,000 and \$25,000 had monthlyIncome Amounts close to the borrowed amounts.
Feel free to fiddle with the plot to see what insight you may uncover!
init_notebook_mode(connected=True)
df=loan_parallel
data = [
go.Parcoords(
line = dict(color = df['Par_col'],
colorscale = [[0,'#84807A'],[0.5,"#FE7702"],[1,'#E10C79']]),
#line = dict(color = df['colorVal'],
# colorscale = 'Jet',
# showscale = True,
# reversescale = True,
# cmin = -4000,
# cmax = -100),
dimensions = list([
dict(range = [100,35000],
constraintrange=[20000,25000],
label = 'Original Loan Amount', values = df['LoanOriginalAmount']),
dict(tickvals = [1,2,3,4,5],
constraintrange=[4,5],
ticktext = ['Completed', 'Current', 'PastDue', 'Defaulted', 'Chargedoff'],
label = 'Loan Status', values = df['Par_col2']),
dict(range=[0.0,10.01],
label = 'DebtToIncomeRatio', values = df['DebtToIncomeRatio']),
dict(range = [0.0,175500],#monthly income range too high 0.0-1755000 reduce it
visible = True,
label = 'Monthly Income(Stated)', values = df['StatedMonthlyIncome']),
dict(range = [0,40],
label = 'Recommendations', values = df['Recommendations']),
dict(range = [1,1189],
label = 'Investors', values = df['Investors']),
dict(range = [0,8],
label = 'TotalProsperLoans', values = df['TotalProsperLoans']),
dict(range = [-1,12],
constraintrange=[9,11],
tickvals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0],
label = 'Prosper Score', values = df['ProsperScore'])
#[0.0,646285.0] also add incomerange
])
)
]
layout = go.Layout(
title='Parallel plot showing interaction of 9 different variables',
plot_bgcolor = '#E5E5E5',
paper_bgcolor = '#E5E5E5',
autosize=False,
width=1000,
height=520
)
fig = go.Figure(data = data, layout = layout)
plotly.offline.iplot(fig,filename='Prosper_loans_parallel')
#plotly.offline.plot(fig, filename = 'Proser_loans.html')
!jupyter nbconvert Prosper_loan_exploration_slide_deck.ipynb --to slides --post serve --template output_toggle
[NbConvertApp] Converting notebook Prosper_loan_exploration_slide_deck.ipynb to slides [NbConvertApp] Writing 12076767 bytes to Prosper_loan_exploration_slide_deck.slides.html [NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0 Serving your slides at http://127.0.0.1:8000/Prosper_loan_exploration_slide_deck.slides.html Use Control-C to stop this server WARNING:tornado.access:404 GET /custom.css (127.0.0.1) 0.99ms